A Study on Load Imbalance in Parallel Hypermatrix Multiplication Using OpenMP
نویسندگان
چکیده
In this paper we present our work on the the parallelization of a matrix multiplication code based on the hypermatrix data structure. We have used OpenMP for the parallelization. We have added OpenMP directives to a few loops and experimented with several features available with OpenMP in the Intel Fortran Compiler: scheduling algorithms, chunk sizes and nested parallelism. We found that the load imbalance introduced by the hypermatrix structure could not be solved by any of those OpenMP features.
منابع مشابه
UTS: An Unbalanced Tree Search Benchmark
This paper presents an unbalanced tree search (UTS) benchmark designed to evaluate the performance and ease of programming for parallel applications requiring dynamic load balancing. We describe algorithms for building a variety of unbalanced search trees to simulate different forms of load imbalance. We created versions of UTS in two parallel languages, OpenMP and Unified Parallel C (UPC), usi...
متن کاملParallel computing using MPI and OpenMP on self-configured platform, UMZHPC.
Parallel computing is a topic of interest for a broad scientific community since it facilitates many time-consuming algorithms in different application domains.In this paper, we introduce a novel platform for parallel computing by using MPI and OpenMP programming languages based on set of networked PCs. UMZHPC is a free Linux-based parallel computing infrastructure that has been developed to cr...
متن کاملAchieving Efficient Strong Scaling with PETSc Using Hybrid MPI/OpenMP Optimisation
The increasing number of processing elements and decreasing memory to core ratio in modern high-performance platforms makes efficient strong scaling a key requirement for numerical algorithms. In order to achieve efficient scalability on massively parallel systems scientific software must evolve across the entire stack to exploit the multiple levels of parallelism exposed in modern architecture...
متن کاملHandling Transient and Persistent Imbalance Together in Distributed and Shared Memory
The recent trend of rapid increase in the number of cores per chip has resulted in vast amount of on-node parallelism. Not only the number of cores per node is increasing substantially but also the cores are becoming heterogeneous. The high variability in the performance of the hardware components introduce imbalance due to heterogeneity. The applications are also becoming more complex resultin...
متن کاملA Comparative Study on Performance Benefits of Multi-core CPUs using OpenMP
Achieving scalable parallelism from general programs was not successful to this point. To extract parallelism from programs has become the key focus of interest on multi-core CPUs. There are many techniques and programming models such as MPI, CUDA and OpenMP adopted in order to exploit more performance. But there is an urge to find the best parallel programming techniques for the benefit of per...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005